Skip to content

Lot's of agent stuff again#343

Open
satti-hari-krishna-reddy wants to merge 20 commits intoShuffle:mainfrom
satti-hari-krishna-reddy:timeout-fix
Open

Lot's of agent stuff again#343
satti-hari-krishna-reddy wants to merge 20 commits intoShuffle:mainfrom
satti-hari-krishna-reddy:timeout-fix

Conversation

@satti-hari-krishna-reddy
Copy link
Copy Markdown
Collaborator

@satti-hari-krishna-reddy satti-hari-krishna-reddy commented Mar 12, 2026

Here are the key changes:

  • Changed abort status from "FINISHED" to "ABORTED" throughout the codebase for clearer execution state tracking

  • Moved abort logic into the abortAgentExecution() function, replacing repetitive error handling code in ~7 locations

  • Fixed Action Reference bug where the wrong action object was being passed to the agent continuation logic. Previously, it was using actionResult.Action, but now it properly retrieves the original action from the execution results array, ensuring the agent has the correct state and context when continuing execution. (Look at line 17846 in shared.go)

  • Added context (ctx) as a parameter to HandleAiAgentExecutionStart()

  • Added validation for caller function info with trace ID logging for better debugging

  • Added "http" into the list of available apps for agent decisions), making the agent work with HTTP functionality again.

  • Added AI_AGENT_LLM_FAILURE log prefix for all LLM-related errors

  • Includes detailed error context: org ID, status codes, error types, and raw responses

  • Helps distinguish between different failure modes (auth errors, rate limits, parsing errors, etc.)

  • Added caller function and trace ID logging for agent starts

  • Better separation between ABORTED, FAILURE, and FINISHED states

Comment thread shared.go Outdated
formattedAppName := strings.ReplaceAll(strings.ToLower(app.Name), " ", "_")

isInternalShuffleApp := false
switch formattedAppName {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about just matching name seems off, but I'm not sure exactly why.

I wonder if there is a way to "steal" the URL field/apikey by formatting an app in a certain way.

Just need to think it over before merging 🤔

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think having a special verified field for apps built by support users would be better. Given that we're already overriding the URL field when injecting the auth, if we compare additionally to this special field and both are true, we can trust it. What do you think?

Comment thread shared.go

returnAction, err := HandleAiAgentExecutionStart(workflowExecution, actionResult.Action, true)
var originalAction Action
if foundActionResultIndex >= 0 && foundActionResultIndex < len(workflowExecution.Results) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the point of this? I'm always confused when we override actions, when things used to work without it

Copy link
Copy Markdown
Collaborator Author

@satti-hari-krishna-reddy satti-hari-krishna-reddy Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At line 8295 in ai.go, we are building actionResult, and inside of that we are setting the action's label name as fmt.Sprintf("Agent Decision %s", decision.RunDetails.Id). Then we call handleAgentDecisionStreamResult, and that one passes actionResult to HandleAiAgentExecutionStart. Now startNode is overwritten with an action with label, e.g., "Agent Decision RNnBU2aY". Due to this, the later nodes in workflow that are trying to refer to the AI agent node with its original label can't find it, resulting in being unable to get any data from the agent node at all.

@satti-hari-krishna-reddy satti-hari-krishna-reddy marked this pull request as draft April 1, 2026 08:29
@satti-hari-krishna-reddy satti-hari-krishna-reddy marked this pull request as ready for review April 6, 2026 06:29
@satti-hari-krishna-reddy satti-hari-krishna-reddy changed the title Fix agent stuck in waiting state after completion Lot's of agent stuff again Apr 6, 2026
@satti-hari-krishna-reddy
Copy link
Copy Markdown
Collaborator Author

@copilot explain the changes made in this PR

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens up AI agent execution lifecycle handling by adding trace/caller metadata propagation, improving abort/failure state reporting, and reducing duplicate/self-request/log noise via caching.

Changes:

  • Add X-Internal-Caller / X-Trace-ID support (CORS + internal requests) and thread context.Context through agent/automation entry points.
  • Adjust agent self-request caching TTLs and reduce repeated “abort due to result failure” logging via cache-based de-duplication.
  • Refactor agent early termination into abortAgentExecution, improve LLM error logging, and add recovery behavior when LLM returns malformed decision JSON.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
shared.go Adds CORS headers, tweaks agent self-request caching TTL, propagates ctx into agent start paths.
db-connector.go Adds cache-based log de-dupe for abort logging; passes ctx into datastore automation runner.
codegen.go Threads ctx into datastore automation runner; adds trace-id propagation headers to internal agent start request.
ai.go Introduces abortAgentExecution, changes abort semantics to ABORTED, improves LLM failure handling/logging and decision parsing recovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ai.go
Comment on lines +7839 to +7842
traceID, _ := ctx.Value("trace_id").(string)
if strings.TrimSpace(caller) == "" {
log.Printf("ERROR[%s] AI agent: No caller function info provided for AI agent, aborting the request ...", execution.ExecutionId)
return abortAgentExecution(ctx, execution, startNode, AgentOutput{}, "no_caller_info", "No caller function info provided for AI Agent start")
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HandleAiAgentExecutionStart now aborts if ctx is missing a non-empty "caller" value. In this repo, GetContext(request) currently always returns context.Background() (no values), so this change will cause agent starts to abort in normal request flows unless every call site/middleware injects ctx.Value("caller"). Consider making caller optional (log a warning + default), or ensure all entry points populate it (e.g., derive from an HTTP header like X-Internal-Caller, or set it at the immediate call sites).

Suggested change
traceID, _ := ctx.Value("trace_id").(string)
if strings.TrimSpace(caller) == "" {
log.Printf("ERROR[%s] AI agent: No caller function info provided for AI agent, aborting the request ...", execution.ExecutionId)
return abortAgentExecution(ctx, execution, startNode, AgentOutput{}, "no_caller_info", "No caller function info provided for AI Agent start")
traceID, _ := ctx.Value("trace_id").(string)
if strings.TrimSpace(caller) == "" {
caller = "unknown"
log.Printf("[WARNING][%s] AI agent: No caller function info provided for AI agent start; defaulting caller=%q", execution.ExecutionId, caller)

Copilot uses AI. Check for mistakes.
Comment thread shared.go
Comment thread ai.go Outdated
Comment on lines +8306 to +8324
b := make([]byte, 6)
rand.Read(b)

mappedDecisions = []AgentDecision{{
I: lastFinishedIndex,
Category: "finish",
Action: "finish",
Tool: "core",
Confidence: 1.0,
Runs: "1",
ApprovalRequired: false,
Reason: reason,
Fields: []Valuereplace{{
Key: "output",
Value: outputValue,
}},
RunDetails: AgentDecisionRunDetails{
Id: base64.RawURLEncoding.EncodeToString(b),
Status: "",
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rand.Read here is from math/rand (not crypto/rand) and the global RNG is never seeded in this file, so the generated RunDetails.Id can be deterministic across process starts and collide. It also ignores the returned error/byte count. Prefer using crypto/rand (with error handling) or an existing UUID generator (uuid.NewV4) to create unique, non-deterministic IDs for decision/run identifiers.

Copilot uses AI. Check for mistakes.
Comment thread codegen.go
Comment thread shared.go
Comment on lines +21816 to +21818
if ctx == nil {
ctx = context.Background()
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx==nil guard is a good idea, but the indentation/trailing whitespace here looks inconsistent with gofmt formatting. Running gofmt on this block will keep formatting consistent across the file.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants